Show language: C# VB.NET Both
This page describes 3 methods of indexing, importing an entire source, adding 1 document at a time (incremental indexing), and adding data directly to the index as strings
Importing An Entire Source
'Importing' a website/file-system folder/database/DataSet means that the indexer will scan for all available documents/pages/data and index everything that matches the import criteria. Reimporting will cause the indexer to rescan the source for changes (where possible, otherwise reindex everything). To import programmatically, use the appropriate Import method in DocumentIndex;
Also see here for a complete code example (Form)
More information on import parameters.
DocumentIndex documentIndex = new DocumentIndex(configuration);
//import a website
documentIndex.ImportWebsite( startURL );
//or like this
documentIndex.Import(new WebsiteBasedIndexableSourceRecord( startURL, pathMatchesToBeIgnored, pathMatchesToBeIncluded));
//or import a file system folder
string localFolderPath = @"C:\inetpub\wwwroot";
string virtualPath = "http://localhost/";
ArrayList targetMatchList = null, ignoreMatchList = null;
bool recurseSubFolders = true;
documentIndex.ImportFileSystemFolder(localFolderPath, virtualPath, targetMatchList, ignoreMatchList, recurseSubFolders);
//or import a database
documentIndex.ImportDatabase(sourceType, connectionString, sqlQuery, uniqueColumnName, resultUrlFormat);
//or import a DataSet (from an assembly)
documentIndex.ImportCustomDataSet(assemblyFilePath, fullClassName, uniqueColumnName, resultUrlFormat);
documentIndex.Close();
Dim documentIndex As New DocumentIndex(configuration)
'import a website
documentIndex.ImportWebsite( startURL )
'or like this
documentIndex.Import(new WebsiteBasedIndexableSourceRecord( startURL, pathMatchesToBeIgnored, pathMatchesToBeIncluded))
'or import a file system folder
documentIndex.ImportFileSystemFolder(localFolderPath, virtualPath, targetMatchList, ignoreMatchList, recurseSubFolders)
'or import a database
documentIndex.ImportDatabase(sourceType, connectionString, sqlQuery, uniqueColumnName, resultUrlFormat)
'or import a DataSet (from an assembly)
documentIndex.ImportCustomDataSet(assemblyFilePath, fullClassName, uniqueColumnName, resultUrlFormat)
documentIndex.Close()
To reimport the index use
documentIndex.ReimportIndexableSources()
To reindex one specific source, obtain the IndexableSourceRecord from:
documentIndex.GetIndexableSourceRecords()
and pass the IndexableSourceRecord to
documentIndex.Import(sourceRecordFromTheList)
Instead of importing an entire source, it is possible to add documents/data to the index incrementally. This is ideal for updating the index as documents are created/uploaded.
DocumentIndex documentIndex = new DocumentIndex(configuration);
try{
documentIndex.AddDocument(new Document("http://some/URL/document", configuration));
} finally {
documentIndex.Close();
}
Dim documentIndex As DocumentIndex = New DocumentIndex(configuration)
Try
documentIndex.AddDocument(new Document("http://some/URL/document", configuration))
Finally
documentIndex.Close()
End Try
Note that "AddDocument" may or may not complete in a trivial amount of time (the actual amount of time depends on many factors including machine load, document size/type, index size, whether the index is due optimization etc), therefore it is not advisable for use in web applications (as the web page doing the indexing will not return to the user until AddDocument has finished).
Adding to the index asynchronously allows your code to return immediately (e.g. for a web application's upload document page to return immediately), while the document is queued up to be added to the index as soon as possible in the background. To do this use the AsynchronousQueue class (in namespace Keyoti.SearchEngine.Index) - which will queue up AddDocument operations and call them in their original order. AsynchronousQueue uses it's own instance of DocumentIndex, and will create and close that instance as necessary (therefore it is important not to have another instance of DocumentIndex open on the same index directory while there are items in the queue).
//...this code could be called in a button event handler in a web page for example
EventHandler finished = delegate(object sender, EventArgs e)
{
//at this point the index directory is unlocked and there are no more items pending adding to the index.
};
AsynchronousQueue.QueueForIndexing(new Document("http://someURL/somepage.aspx", Configuration), finished);
AsynchronousQueue.QueueForIndexing(new Document("http://someURL/somepage2.aspx", Configuration), finished);
Private Sub MyFunc()
'...this code could be called in a button event handler in a web page for example
Dim finished As EventHandler = AddressOf Me.OnFinished
AsynchronousQueue.QueueForIndexing(New Document("http://someURL/somepage.aspx", Configuration), finished)
AsynchronousQueue.QueueForIndexing(New Document("http://someURL/somepage2.aspx", Configuration), finished)
End Sub
Private Sub OnFinished(ByVal sender As Object, ByVal e As EventArgs)
'at this point the index directory is unlocked and there are no more items pending adding to the index.
End Sub
Use the RemoveDocument method in DocumentIndex to remove a document from the index. It's important that the document URL matches exactly with the URL already in the index. Please pay attention to trailing slashes (e.g. http://localhost/) and ensure any spaces are encoded as %20.
Removing from the index asynchronously allows your code to return immediately (e.g. for a web application's deleete document page to return immediately), while the document is queued up to be removed from the index as soon as possible in the background. To do this use the AsynchronousQueue class (in namespace Keyoti.SearchEngine.Index) - which will queue up RemoveDocument operations and call them in their original order. AsynchronousQueue uses it's own instance of DocumentIndex, and will create and close that instance as necessary (therefore it is important not to have another instance of DocumentIndex open on the same index directory while there are items in the queue).
This is the same queue as the asynchronous adding example uses and both add and remove operations can be mixed.
//...this code could be called in a button event handler in a web page for example
EventHandler finished = delegate(object sender, EventArgs e)
{
//at this point the index directory is unlocked and there are no more items pending adding to the index.
};
AsynchronousQueue.QueueForRemoval(new Document("http://someURL/somepage.aspx", Configuration), finished);
AsynchronousQueue.QueueForRemoval(new Document("http://someURL/somepage2.aspx", Configuration), finished);
Private Sub MyFunc()
'...this code could be called in a button event handler in a web page for example
Dim finished As EventHandler = AddressOf Me.OnFinished
AsynchronousQueue.QueueForRemoval(New Document("http://someURL/somepage.aspx", Configuration), finished)
AsynchronousQueue.QueueForRemoval(New Document("http://someURL/somepage2.aspx", Configuration), finished)
End Sub
Private Sub OnFinished(ByVal sender As Object, ByVal e As EventArgs)
'at this point the index directory is unlocked and there are no more items pending adding to the index.
End Sub
When a row is imported from a DB, we create our own URI for it. To delete that row/document, you need to recreate the URI.
IndexableSourceUri uri = new IndexableSourceUri(1, "d4", "col1");
//where 1 is the IndexableSource ID (see below)
//"d4" is the value in the unique field, that identifies the row to delete
//"col1" is the name of the unique field
documentIndex.RemoveDocument(new Document(uri.UriInstance.AbsoluteUri, Configuration));
col1 data
-------------
a1 blah
b2 some
c3 empty
d4 more
so the code will remove that last row from the index.
The indexable source ID, can be obtained with code like this
ArrayList recs = documentIndex.GetIndexableSourceRecords();
(recs[0] as IndexableSourceRecord).ID;
It is possible to add 'documents' to the index that are defined by strings only. In other words, it is possible to index data without the data having to actually reside in a document/page/database etc. This can be useful in the following scenarios for example;
To do this, use the PreloadedDocument class, which is a simple class where you pass the 'URI' that will identify the indexed data/document, and specify it’s title, text and custom data - all as strings.
documentIndex.AddDocument(new PreloadedDocument(new Uri(uri), title, text, summary, null, null, null, customData, configuration));
documentIndex.AddDocument(new PreloadedDocument(new Uri(uri), title, text, summary, Nothing, Nothing, Nothing, customData, configuration))
Where;
-'uri' is the real or fictitious Uri of the 'document' - this can point to an actual document or just be used as an arbitrary identifier for the indexed data
-'title' is string title of the document, searchable by the user
-'text' is the text body, this is searchable by the user
-'summary' is used for the result summary if a 'static' summary type is selected in the configuration (otherwise the result summary is generated from the text content based on hits)
-The 3 null/nothings are respectively; content category list, location category name and security group list (please see the API docs)
-'customData' is any CustomData to be added to the document record
-'configuration' is the usual configuration object, as was used to create DocumentIndex
To remove a 'document' added with PreloadedDocument, use documentIndex.RemoveDocument, passing in the same Uri that the document was created with.